Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules
نویسندگان
چکیده
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
منابع مشابه
Comparative QSAR Analysis of 3,5-bis (Arylidene)-4-Piperidone Derivatives: the Development of Predictive Cytotoxicity Models
1-[4-(2-Alkylaminoethoxy)phenylcarbonyl]-3,5-bis(arylidene)-4-piperidones are a novel class of potent cytotoxic agents. These compounds demonstrate low micromolar to submicromolar IC50 values against human Molt 4/C8 and CEM T-lymphocytes and murine leukemia L1210 cells. In this study, a comparative QSAR investigation was performed on a series of 3,5-bis(arylidene)-4-piperidones using different ...
متن کاملComparative QSAR Analysis of 3,5-bis (Arylidene)-4-Piperidone Derivatives: the Development of Predictive Cytotoxicity Models
1-[4-(2-Alkylaminoethoxy)phenylcarbonyl]-3,5-bis(arylidene)-4-piperidones are a novel class of potent cytotoxic agents. These compounds demonstrate low micromolar to submicromolar IC50 values against human Molt 4/C8 and CEM T-lymphocytes and murine leukemia L1210 cells. In this study, a comparative QSAR investigation was performed on a series of 3,5-bis(arylidene)-4-piperidones using different ...
متن کاملQSAR Study of 17β-HSD3 Inhibitors by Genetic Algorithm-Support Vector Machine as a Target Receptor for the Treatment of Prostate Cancer
The 17β-HSD3 enzyme plays a key role in treatment of prostate cancer and small inhibitorscan be used to efficiently target it. In the present study, the multiple linear regression (MLR),and support vector machine (SVM) methods were used to interpret the chemical structuralfunctionality against the inhibition activity of some 17β-HSD3inhibitors. Chemical structuralinformation were described thro...
متن کاملQSAR Study of 17β-HSD3 Inhibitors by Genetic Algorithm-Support Vector Machine as a Target Receptor for the Treatment of Prostate Cancer
The 17β-HSD3 enzyme plays a key role in treatment of prostate cancer and small inhibitorscan be used to efficiently target it. In the present study, the multiple linear regression (MLR),and support vector machine (SVM) methods were used to interpret the chemical structuralfunctionality against the inhibition activity of some 17β-HSD3inhibitors. Chemical structuralinformation were described thro...
متن کاملEstimation of the applicability domain of kernel-based machine learning models for virtual screening
BACKGROUND The virtual screening of large compound databases is an important application of structural-activity relationship models. Due to the high structural diversity of these data sets, it is impossible for machine learning based QSAR models, which rely on a specific training set, to give reliable results for all compounds. Thus, it is important to consider the subset of the chemical space ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of computer-aided molecular design
دوره 21 9 شماره
صفحات -
تاریخ انتشار 2007